哈囉,今天是Day2,客倌們一定好奇這什麼標題,其實小編在鐵人賽開始前,就有上了一週的How Google does Machine Learning的課了,由於只差兩份作業(Lab)就能取得這門課的證書了,所以今天想先處理掉這個Lab,
原諒我跳著分享,因為我等不及看證書長什麼樣子啦> <
Lab: Analyzing data using Datalab and BigQuery
1.首先,Start Lab,輸入帳密、確認好專案編號
2.打開Cloud shell,鍵入gcloud compute zones list,確認哪些地區有google cloud伺服器
3.依據AI Platform的支援地區(https://cloud.google.com/ml-engine/docs/tensorflow/regions),我們選擇east1-a架起機器
4.隨後在機器裡架一個vm
5.確認Cloud Source Repositories API 有無在專案裡啟動
6.執行 Navigation menu > BigQuery >top-left-corner menu > click Done
7.執行 More > click Query Settings > 確認 Legacy 沒被選取 (我們使用的是 Standard SQL)
8.在query textbox鍵入:
#standardSQL
SELECT
departure_delay,
COUNT(1) AS num_flights,
APPROX_QUANTILES(arrival_delay, 5) AS arrival_delay_quantiles
FROMbigquery-samples.airline_ontime_data.flights
GROUP BY
departure_delay
HAVING
num_flights > 100
ORDER BY
departure_delay ASC
9.按Run執行
10.開啟一個notebook,並鍵入:
query="""
SELECT
departure_delay,
COUNT(1) AS num_flights,
APPROX_QUANTILES(arrival_delay, 10) AS arrival_delay_deciles
FROMbigquery-samples.airline_ontime_data.flights
GROUP BY
departure_delay
HAVING
num_flights > 100
ORDER BY
departure_delay ASC
"""
import google.datalab.bigquery as bq
df = bq.Query(query).execute().result().to_dataframe()
df.head()
11.shift+enter 執行
12.在下一格鍵入:
import pandas as pd
percentiles = df['arrival_delay_deciles'].apply(pd.Series)
percentiles = percentiles.rename(columns = lambda x : str(x*10) + "%")
df = pd.concat([df['departure_delay'], percentiles], axis=1)
df.head()
13.shift+enter 執行
14.在下一格鍵入:
without_extremes = df.drop(['0%', '100%'], 1)
without_extremes.plot(x='departure_delay', xlim=(-30,50), ylim=(-50,50));
15.shift+enter 執行